Address decoding circuit

ABSTRACT

A decoder is disclosed that is used to select an area of address space in an Integrated Circuit. The decoder uses a hardware shifting module that performs shift operations on constants. Such a structure reduces an overall area consumption of the shifting module. Additionally, the decoder can perform a multi-bit shift operation in a single clock cycle.

BACKGROUND

A System on Chip (SoC) is a type of integrated circuit (IC) that combines many or all components of a computer into a single chip. Traditionally, SoCs partition the address space into different regions (DRAM regions, internal chip addresses, bus addresses, etc.) Each transaction in the internal SoC fabric is address-decoded in order to identify the required destination, and the transaction is routed accordingly. Typically, an address decoder can be used that has address bits as inputs and selection lines of the destinations as outputs. For example, the SoC can include multiple master/slave combinations and each master can include a separate decoder that selects the appropriate slave using the selection lines.

Address decoding typically uses inequality comparators, wherein two inequality comparators are used per slave window. Largely, due to the inequality comparators, decoders are very area expensive. And, due to the large number of slaves, decoder inefficiency has become problematic overall in terms of area usage in a SoC or other integrated circuit.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a system diagram showing a decoder hardware design for selecting one of a plurality of memory regions based on an input transaction address; the decoder hardware design is area optimized and does not use inequality comparators.

FIG. 2 shows an integrated circuit in which the decoder hardware of FIG. 1 can be used.

FIG. 3 shows a particular embodiment of decoder hardware for selecting one of a plurality of memory regions based on an input transaction address.

FIG. 4 shows an embodiment of shifting hardware that can be used in the decoder hardware.

FIG. 5 is a flowchart of a method for decoding a transaction address according to one embodiment.

FIG. 6 is a flowchart of a method for decoding a transaction address with additional process blocks that can be used with the FIG. 5 embodiment.

FIG. 7 depicts a generalized example of a suitable computing environment in which the described innovations may be implemented.

DETAILED DESCRIPTION

Decoders that include inequality comparators are area expensive. Shift operations on input variables in a SoC or other integrated circuit can also be expensive in terms of area. However, when performing shift operations on constants, area consumption is comparatively less. Thus, it is desirable to provide a decoder that performs a multi-bit shift operation in one clock cycle and the shifting hardware operates on a constant. The single-shift operation and the shifting hardware that operates on the constant, results in reduced area that is consumed by the shifting hardware.

FIG. 1 shows a decoder 102 that receives a transaction address 110 and other inputs 111 (described further below) and that provides a selection signal 112 corresponding to which region of an address space 120 has been selected. The transaction address 110 typically has a first portion used to generate the selection signal 112 and a second portion that represents an offset into the selected memory region, wherein the sizing of the first and second portions changes for each region of the address space. The address space 120 defines a range of discrete addresses that can correspond to a memory cell, a peripheral device, a disk sector, an I/O device or other logical or physical entity. Each region has a corresponding base address and window size. For example, a DRAM high region, shown at 122, has a base address of 01_0000_0000 and a window size of 4 Gigabytes so that the next region of memory starts at 02_0000_0000. Although seven different regions are shown for the address space 120, any number of regions can be used. The illustrated regions include 1) DRAM high, reclaim address space; 2) DRAM high; 3) a boot sector; 4) hardware engine 1; 5) hardware engine 2;6) PCIe bus addresses; and 7) DRAM low. The base addresses for each region are shown at 124, and the addresses chosen are specific to the particular implementation. In the illustrated embodiment, the address space 120 is an operating system view of memory in an integrated circuit in which the decoder 102 sits. However, different configurations of the address space can be used.

Details of the decoder 102 are shown in the decoder logic 130. Each region of the address space 120 has similar decoder logic 130 associated therewith. For example, if the illustrated decoder logic 130 is associated with the DRAM high region 122, then an output hit/miss output 134 indicates a hit when the transaction address 110 is pointing within that region, otherwise it indicates a miss. The decoder logic 130 includes shifting hardware 140, XOR or XNOR logic 142, and combinatorial logic 144. The shifting hardware 140 is a bitwise operation that shifts all bits of an operand either right or left. The shifting hardware 140 can shift by any number of bit positions in a single clock cycle and it is the only shifting operation needed by the decoder 102. Thus, the shifting hardware 140 can shift by 1 bit, 2 bits, or N bits (where N is limited by the size of the shifting hardware). The shifting hardware 140 uses a constant value 150 input as the operand. The constant value 150 can be a bitwise logical 1 (e.g., 11111 . . . 1) and the shifting hardware 140 can shift left. A logical 0 is shifted in from the right as logical 1's are shifted out. The other inputs 111 include a window size signal 152 and a base address of the slave region 160. The number of bit positons that the shifting hardware shifts is determined by the window size signal 152, which is associated with a window size of the region. Continuing with the above example, the window size can be associated with the size of the DRAM high region 122. An output of the shifting hardware 140 is a mask signal 154. As further described below, the shifting hardware 140 only performs a single shift operation in one clock cycle so that the decoder logic 130 can generate a hit/miss output 134 in one clock cycle. The base address of the slave region 160 and the transaction address 110 are input into the XOR or XNOR logic 142. An example base address is 01_0000_0000 for the DRAM high region 122. The output of XOR/XNOR logic 142 is combined with mask signal 154 in the combinatorial logic 144. In one example, the combinatorial logic 144 can be a combination of an AND gate and an OR gate.

Other logic combinations can be used. The resultant output 134 indicates whether the transaction address 110 is within the associated region of address space. For example, if the transaction address is between 01_0000_0000 and 02_0000_0000, and the decoder logic 130 is associated with the DRAM high region 122, then the hit/miss output 134 indicates a hit. Otherwise, a miss is indicated.

The indicated address space 120 can have seven different decoders 130 that each receive the transaction address 110 in parallel. Each decoder is assigned to one of the regions in the address space 120. Thus, for each transaction address, one of the decoders indicates a hit and selects the corresponding region of the address space. A simple example is shown below for a decoder similar to decoder 102 of FIG. 1, but for three different address regions called slave 0, slave 1, and slave 2. Three different decoder configurations are aligned in parallel and each receives the transaction address, which in this example is 10010. A bitwise comparison of the base address and transaction address is performed by an XOR gate in each decoder. The Tmp mask field is a constant input of 11111 and acts as the operand into the shifting hardware. The shift left mask is generated by shifting the operand 3 bits, corresponding to a window size of 8(2³=8) for slave 0, 2 bits for slave 1, and 3 bits for slave 2. Thus, the following formula can be used to determine the number of bits that the shifting hardware is shifted:

2^(Shifting select)=window size. The shifting select exponent is a representation of a number of addresses in the region of address space. Notably, different window sizes can be used for the different regions of the address space.

The following table shows the operations for the decoders:

Base Addr Trans XOR between Trans Tmp Shift [binary] Addr and Base Addr Mask Left Mask Slave 0 00000 10010 10010 11111 11000 Slave 1 01000 10010 11010 11111 11100 Slave 2 10000 10010 00010 11111 11000

As shown below, the shift-left mask is applied to output of the XOR gate. More particularly, a bitwise AND operation is performed on the shift-left mask and the output of the XOR gate. The result column can be considered an intermediate result or a final result. A result that is all 0's for Slave 2 is indicative of a hit.

XOR between Trans and Base Addr Shift Left Mask Result Slave 0 10010 11000 10000 Slave 1 11010 11100 11000 Slave 2 00010 11000 00000

The intermediate result can be converted to a single bit by passing the result through an OR gate and invertor to obtain a final result, as shown below. As indicated, the transaction address results in selection of an address region corresponding to slave 2.

Result Collapse or not Slave 0 10000 1 0 Slave 1 11000 1 0 Slave 2 00000 0 1

FIG. 2 illustrates an embodiment of an Integrated Circuit (IC) 200 having a master/slave configuration with multiple master modules 210, 212, 214 coupled in parallel. As indicated by dots 216, any number of master modules can be used. The master modules can include any desired hardware, but each includes a decoder 220, 222, and 224, respectively. The decoders 220, 222, and 224 can be equivalent hardware except for a base address that is applied to each decoder and a window size parameter. The decoders 220, 222, 224 can be coupled to a bus that is also coupled to multiple slaves 230, 232, 234, and 236. One of the slaves is enabled in response to the decoders 220, 222, and 224 selection signals. The transaction address offset is then used to access (read from or write to) the selected slave address. The decoders 220, 222, 224 can be similar to the decoder logic 130 of FIG. 1.

In an alternative embodiment, each decoder 220, 222, and 224 can be coupled to one slave using a select line (instead of the bus). The select line can be pulled inactive and become active only upon a hit signal from the decoder. In this way, the slave can be enabled and the offset of the transaction address can be used to read from or write to the selected address corresponding to the transaction address.

FIG. 3 shows a detailed view of the decoders 220 and 224 of FIG. 2 (decoder 222 has a similar structure.) In decoder 220, a window size for slave 0 can be input into an optional −1 operation shown at 310. In some embodiments, it is desirable to subtract 1 from the size parameter. However, the −1 operation can be removed, if desired, as indicated by the dashed lines of box 310. The window size (or the window size −1) is input into shifting hardware 320 as a select input that selects a number of bits that an input operand is shifted. Typically, the window size is an exponent of base 2 that can be used to compute a total number of addresses within an address region. The input operand is shown as a constant 11111 . . . 11 that can be supplied from a multibit bus. Other constant values can be used. The number of bits of the constant should be large enough to support a maximum select input of the shifting hardware 320. The output of the shifting hardware 320 is a mask output. In parallel with the shifting hardware 320, a first hardware logic 330 receives a base address for slave 0 and the transaction address. The first hardware logic 330 can be a bitwise logical XOR gate that performs a bit-by-bit comparison of the base address for slave 0 and the transaction address. The output of the first hardware logic 330 is input into a second hardware logic 340, which logically combines the mask output from the shifting hardware 320 and the first hardware logic output. The second hardware logic 340 can include a bitwise logical AND gate that combines both inputs and produces an output that is indicative of whether the transaction address is within the region of address space (i.e., the slave 0 region). A logical NOR gate 350 can combine (also called collapse) all of the bits to produce a single-bit hit/miss output. The NOR gate 350 can be replaced with an OR gate, an AND gate or a NAND gate. The hit/miss output is also indicative of whether the transaction address is within the slave 0 region of address space.

The decoder 224 can have a similar structure to decoder 220. However, the window size and the base address are for slave N (wherein N is any number). The transaction address is the same transaction address supplied to each decoder in parallel, and only one of the decoders typically has a hit, while the others have a miss. The decoders 220, 224 notably use a shift left of a constant operand by a number of bits dictated by a window size. Such a structure provides a considerable area advantage over shift registers that do not operate on a constant. Likewise, the above structure uses considerably less area than decoders that use inequality comparators.

FIG. 4 is an example of shifting hardware 400 that can be used in any of the embodiments described herein. The shifting hardware 400 can include a plurality of multiplexers 410 coupled in series. Specifically, multiplexers 420, 430 and 440 are coupled in series with different bits of a multiple bit select line controlling the multiplexers. The select lines shown as sel[0], sel[1], and sel[2] can be constant voltages based on a window size of the slave device. The inputs to the multiplexers are either constants, an output of a previous multiplexer, or both. The number of multiplexers needed is greatly reduced (compared to other designs) by using the constant inputs and by using a shift left operation.

FIG. 5 is a flowchart of a method for decoding. In process block 510, an address space is divided into multiple regions. Each region includes a base address and a size (also called a window size) that is associated with a number of addresses in the region. Each address can be any number of bytes, such as 1, 2, 4, 8, or 16 bytes. In process block 520, a transaction address is received that is to be decoded. The transaction address is associated with one of the addresses within the different regions and the decoder determines which region. In process block 530, the transaction address is logically combined with a base address of one of the regions of interest to produce a first result. The logically combining can be accomplished using an XOR gate or an XNOR gate. Such logical combining can be a bitwise comparison of the two inputs. In process block 540, a shift left operation can be used wherein a number of bit positions shifted is controlled by a size of the region of address space for which the decoder is associated. The shift-left operation can be the only shifting operation needed to decode an address. Additionally, the shifting can occur in one clock cycle. An operand of the shifting hardware can be a constant and the shifting hardware can shift in logical 0's as the constant is shifted left. An output of the logical combining can be a mask value. In process block 550, the mask value can be applied to the first result to obtain a second result that indicates whether the transaction address is within the region of address space associated with the decoder.

FIG. 6 is a flowchart according to a method that can be used to decode an incoming address. The process blocks of FIG. 6 can be additional process blocks to FIG. 5. In process block 610, a shift operation, such as the one described in FIG. 5, is performed in one clock cycle using a single shift-left operation. Additionally, a constant can be used as the operand being shifted. Still further, during the decoding process of the transaction address, the shift-left operation can be the only shift performed. The shifting hardware performing the shift operation can include a plurality of multiplexers coupled in series with select lines of the multiplexers coupled to an input that corresponds to a size of the corresponding region. In process block 620, all of the bits resulting from process block 550 in FIG. 5 can be logically combined to produce a single, one-bit hit/miss result. The logical combining can be accomplished using an OR gate or a NOR gate. The output of the logical combining can be a single bit indicative of a hit or miss.

FIG. 7 depicts a generalized example of a suitable computing environment 700 in which the described innovations may be implemented. For example, an integrated circuit herein can be positioned within the computing environment. The computing environment 700 is not intended to suggest any limitation as to scope of use or functionality, as the innovations may be implemented in diverse general-purpose or special-purpose computing systems. For example, the computing environment 700 can be any of a variety of computing devices (e.g., desktop computer, laptop computer, server computer, tablet computer, etc.)

With reference to FIG. 7, the computing environment 700 includes one or more processing units 710, 715 and memory 720. The computing environment can also include an IC 725 that incorporates the decoder technology described herein. In FIG. 7, this basic configuration 730 is included within a dashed line. The processing units 710 and 715 execute computer-executable instructions. A processing unit can be a general-purpose central processing unit (CPU), processor in an application-specific integrated circuit (ASIC) or any other type of processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. For example, FIG. 7 shows a central processing unit 710 as well as a graphics processing unit or co-processing unit 715. The tangible memory 720 may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two, accessible by the processing unit(s). The memory 720 stores software 780 implementing one or more innovations described herein, in the form of computer-executable instructions suitable for execution by the processing unit(s). The software 780 can also be integrated into the IC.

A computing system may have additional features. For example, the computing environment 700 includes storage 740, one or more input devices 750, one or more output devices 760, and one or more communication connections 770. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment 700. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 700, and coordinates activities of the components of the computing environment 700.

The tangible storage 740 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information in a non-transitory way and which can be accessed within the computing environment 700. The storage 740 stores instructions for the software 780 implementing one or more innovations described herein.

The input device(s) 750 may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 700. The output device(s) 760 may be a display, printer, speaker, CD-writer, or another device that provides output from the computing environment 700.

The communication connection(s) 770 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media can use an electrical, optical, RF, or other carrier.

Although the operations of some of the disclosed methods are described in a particular, sequential order for convenient presentation, it should be understood that this manner of description encompasses rearrangement, unless a particular ordering is required by specific language set forth below. For example, operations described sequentially may in some cases be rearranged or performed concurrently. Moreover, for the sake of simplicity, the attached figures may not show the various ways in which the disclosed methods can be used in conjunction with other methods.

Any of the disclosed methods can be implemented as computer-executable instructions stored on one or more computer-readable storage media (e.g., one or more optical media discs, volatile memory components (such as DRAM or SRAM), or non-volatile memory components (such as flash memory or hard drives)) and executed on a computer (e.g., any commercially available computer, including smart phones or other mobile devices that include computing hardware). The term computer-readable storage media does not include communication connections, such as signals and carrier waves. Any of the computer-executable instructions for implementing the disclosed techniques as well as any data created and used during implementation of the disclosed embodiments can be stored on one or more computer-readable storage media. The computer-executable instructions can be part of, for example, a dedicated software application or a software application that is accessed or downloaded via a web browser or other software application (such as a remote computing application). Such software can be executed, for example, on a single local computer (e.g., any suitable commercially available computer) or in a network environment (e.g., via the Internet, a wide-area network, a local-area network, a client-server network (such as a cloud computing network), or other such network) using one or more network computers.

For clarity, only certain selected aspects of the software or HDL-based implementations are described. Other details that are well known in the art are omitted. For example, it should be understood that the disclosed technology is not limited to any specific computer language or program. For instance, the disclosed technology can be implemented by software written in C++, Java, and alike, as well as any hardware description language (e.g., Verilog, VHDL). Likewise, the disclosed technology is not limited to any particular computer or type of hardware. Certain details of suitable computers and hardware are well known and need not be set forth in detail in this disclosure.

It should also be well understood that any functionality described herein can be performed, at least in part, by one or more hardware logic components, instead of software. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Program-specific Standard Products (PSSPs), System-on-a-chip systems (SoCs), Complex Programmable Logic Devices (CPLDs), Central Processing Units (CPUs), etc.

Furthermore, any of the software-based embodiments (comprising, for example, computer-executable instructions for causing a computer to perform any of the disclosed methods) can be uploaded, downloaded, or remotely accessed through a suitable communication means. Such suitable communication means include, for example, the Internet, the World Wide Web, an intranet, software applications, cable (including fiber optic cable), magnetic communications, electromagnetic communications (including RF, microwave, and infrared communications), electronic communications, or other such communication means.

The disclosed methods, apparatus, and systems should not be construed as limiting in any way. Instead, the present disclosure is directed toward all novel and nonobvious features and aspects of the various disclosed embodiments, alone and in various combinations and subcombinations with one another. The disclosed methods, apparatus, and systems are not limited to any specific aspect or feature or combination thereof, nor do the disclosed embodiments require that any one or more specific advantages be present or problems be solved.

In view of the many possible embodiments to which the principles of the disclosed invention may be applied, it should be recognized that the illustrated embodiments are only preferred examples of the invention and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope of these claims. 

What is claimed is:
 1. An integrated circuit, comprising: a plurality of decoder modules, each decoder module for selecting a region of address space among multiple regions within the integrated circuit, wherein each decoder module includes: shifting hardware having a first shifter input configured to receive a constant value and a second shifter input configured to receive a number of bits that the constant value is to be shifted, wherein the number of bits corresponds to a window size of the region of address space, the shifting hardware configured to produce a mask output; first hardware logic configured to receive a base address of the region of address space on a first hardware-logic input and to receive a transaction address on a second hardware-logic input and to logically combine the base address and the transaction address to produce an output; and second hardware logic configured to combine the output of the first hardware logic and the mask output to produce an output indicative of whether the transaction address is within the region of the address space.
 2. The integrated circuit of claim 1, wherein the number of bits equals the window size minus one.
 3. The integrated circuit of claim 1, wherein the shifting hardware performs a single shift operation to produce the output indicative of whether the transaction address is within the region of address space.
 4. The integrated circuit of claim 1, wherein the first hardware logic includes XOR or XNOR combinatorial logic.
 5. The integrated circuit of claim 1, wherein the second hardware logic includes an AND gate that provides a bitwise logical combination of the output of the first hardware logic and the mask output.
 6. The integrated circuit of claim 5, further including an OR gate or a NOR gate coupled to the bitwise logical combination.
 7. A method, comprising: providing an address space divided into multiple regions of an integrated circuit; receiving a transaction address to be decoded so as to access the address space; logically combining the transaction address with a base address of a first region of the multiple regions of the address space to obtain a first result; shifting a constant input by a number of bits corresponding to a size of the first region of the address space to produce a mask value; and applying the mask value to the first result to obtain a second result, which indicates whether the transaction address is within the first region of address space.
 8. The method of claim 7, wherein the logical combining is accomplished using an XOR gate or an XNOR gate.
 9. The method of claim 7, wherein the constant input is a plurality of bits set to a logical high value and wherein the shifting includes shifting left by the number of bits and shifting in bits set to a logical low value.
 10. The method of claim 7, further including logically combining all bits of the second result using an OR operation, a NOR operation, an AND operation or a NAND operation.
 11. The method of claim 7, further including using the transaction address to determine an offset into the first region of address space.
 12. The method of claim 7, wherein the number of bits is a size of the first region minus one.
 13. The method of claim 7, wherein the shifting is performed in one clock cycle and is the only shifting operation used to obtain the second result.
 14. The method of claim 7, wherein the shifting is performed by a single shift-left module, the constant input is a logical 1, and the shift-left module shifts in a logical
 0. 15. The method of claim 14, wherein the single shift-left module includes a plurality of multiplexers coupled in series with select lines of the multiplexers coupled to an input that corresponds to a size of the first region.
 16. A system, comprising: an address decoder for each region of an address space, the address decoder for a region of address space including: a shift-left module including multiplexers coupled in series with input lines of the multiplexers configured to receive constant voltage inputs and select lines of the multiplexers configured to receive input signals associated with a window size of the region of address space, the shift-left module configured to provide a mask output signal; first combinatorial logic configured to combine a base address of the region of address space and an input address, the input address being decoded to provide an intermediate result; and second combinatorial logic configured to combine the intermediate result and the mask output signal and to provide an output indicating whether the input address being decoded is within the region of address space.
 17. The system of claim 16, wherein the first combinatorial logic includes XOR or XNOR logic gates.
 18. The system of claim 16, wherein the output of the second combinatorial logic is a multiple-bit output, the system further comprising an OR gate, NOR gate, AND gate or NAND gate to combine the multiple-bit output to a single hit or miss output bit.
 19. The system of claim 16, wherein the input signals associated with the window size of the region of address space includes a representation of a number of addresses in the region of address space, minus one.
 20. The system of claim 16, wherein the shifting-left module is configured to perform a shift left in one clock cycle and is the only shifting operation used to obtain the intermediate result. 